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(57) Abstract: The invention relates to a method for transmitting video information, in which at least a first bit-stream (510) and 
a second bit-stream are formed. The first bit-stream (510) comprises at least one video frame, and the second bit-stream (520) 
comprises at least one predictive video frame (524). At least partly different encoding parameters are used with encoding of the 
frames of said first bit-stream (510) and said second bit-stream (520). At least one frame of said first bit-stream (510) is being 
transmitted, and the transmission is switched over from said first (510) to said second bit-stream (520). In switching the transmission 
over from said first (510) to said second bit-stream (520), a secondary switching frame (550) is transmitted, which have been encoded 
using the encoding parameters of the second bit-stream (520) and at least one reference frame from the first bit-stream (510). Said 
secondary switching frame (550) is used as a reference frame in the reconstruction of said at least one predictive video frame (524) 
of the second set of video frames. The invention also relates to an encoder for encoding video information, a decoder for decoding 
a video information, and a signal representing encoded video information. 
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SWITCHING BETWEEN BIT-STREAMS IN VIDEO TRANSMISSION 



The present invention relates to a method for transmitting video 
information, in which at least a first bit-stream and a second bit-stream 
are formed from the video information, the first bit-stream comprises a 
first set of frames comprising at least one video frame, and the second 
bit-stream comprises a second set of frames comprising at least one 
predictive video frame, at least partly different encoding parameters are 
used with encoding of the frames of said first bit-stream and said 
second bit-stream, at least one frame of said first bit-stream is being 
transmitted, and the transmission is switched over from said first bit- 
stream to said second bit-stream. The invention also relates to an 
encoder comprising means for forming at least a first bit-stream and a 
second bit-stream from a video information, the first bit-stream 
comprises a first set of frames comprising at lest one video frame, and 
the second bit-stream comprises a second set of frames comprising at 
least one predictive video frame, and means for using at least partly 
different encoding parameters with encoding of the frames of said first 
bit-stream and said second bit-stream, means for transmitting at least 
one frame of said first bit-stream, and means for switching the 
transmission over from said first bit-stream to said second bit-stream. 
The invention further relates to a decoder for decoding a video 
information from a signal comprising frames from at least a first bit- 
stream and a second bit-stream, which are formed from the video 
information, the first bit-stream comprises a first set of frames 
comprising at lest one video frame, and the second bit-stream 
comprises a second set of frames comprising at least one predictive 
video frame, and at least partly different encoding parameters are used 
with encoding of the frames of said first bit-stream and said second bit- 
stream. The invention further relates to a signal representing encoded 
video information, and comprising frames from at least a first bit-stream 
and a second bit-stream, which are formed from the video information, 
the first bit-stream comprises a first set of frames comprising at lest 
one video frame, and the second bit-stream comprises a second set of 
frames comprising at least one predictive video frame, at least partly 

CONFIRMATION COPY 



CA 02431866 2003-06-13 



2 

different encoding parameters are used with encoding of the frames of 
said first bit-stream and said second bit-stream. 

BACKGROUND 

Recently, multimedia applications that include streaming audio and 
video information have come into greater use. Several international 
standardization organizations have established and proposed 
standards for compressing/encoding and decompressing/decoding the 
audio and video information. MPEG standards, established by the 
Motion Picture Expert Group, are the most widely accepted 
international standards in the field of the multimedia applications. 
VCEG is the "Video Coding Experts Group", working under the 
direction of the ITU Telecommunication Standardization Sector (ITU-T). 
This group works on the standard H.26L for the coding of moving 
pictures. 

A typical video stream comprises a sequence of pictures, often referred 
to as frames. The frames comprise pixels arranged into a rectangular 
form, in existing video coding standards, such as H.261, H.262, H.263, 
H.26L and MPEG-4, three main types of pictures are defined: Intra 
frames (l-frames), Predictive frames (P-frames) and Bi-directional 
frames (B-frames). Each picture type exploits a different type of 
redundancy in a sequence of images and consequently results in 
different level of compression efficiency and, as explained in the 
following, provides different functionality within the encoded video 
sequence. An intra frame is a frame of video data that is coded by 
exploiting only the spatial correlation of the pixels within the frame itself 
without using any information from the past or the future frames. Intra 
frames are used as the basis for decoding/decompression of other 
frames and provide access points to the coded sequence where 
decoding can begin. 

A predictive frame is a frame that is encoded/compressed using motion 
compensated prediction from a so-called reference frame, i.e. one or 
more previous/subsequent Intra frames or Predictive frames available 
in an encoder or in a decoder. A bi-directional frame is a frame that is 
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encoded/compressed by prediction from a previous Intra frame or 
Predictive frame and/or a subsequent Intra frame or Predictive frame. 

Since adjacent frames in a typical video sequence are highly 
correlated, higher compression can be achieved when using Bi- 
directional or Predictive frames instead of Intra frames. On the other 
hand, when the temporal predictive coding is employed within the 
coded video stream, B-frames and/or P-frames cannot be decoded 
without correctly decoding all the other previous and/or subsequent 
reference frames which were used with coding of the Bi-directional and 
Predictive frames. In situations in which the reference frame(s) used in 
the encoder and respective reference frame(s) in the decoder are not 
identical either due to errors during transmission or due to some 
intentional action on the transmitting side, the subsequent frames that 
make use of prediction from such a reference frame can not be 
reconstructed on the decoding side to yield a decoded frame identical 
to that originally encoded on the encoding side. This mismatch is not 
only confined to a single frame but further propagates in time due to 
the use of motion compensated coding. 

Figs. 1A-1C illustrate the types of encoded/compressed video frames 
used in a typical video encoding/decoding system. Advantageously, 
prior to encoding, the pictures of the video sequence are represented 
by these matrices of multiple-bit numbers, one representing the 
luminance (brightness) of the image pixels, and the other two each 
representing a respective one of two chrominance (colour) 
components. Fig. 1A depicts the way in which an Intra frame 200 is 
encoded using only image information present in the frame itself. Fig. 
1B illustrates construction of a Predictive frame 210. Arrow 205a 
represents the use of motion compensated prediction to create the P- 
frame 210. Fig. 1C depicts construction of Bi-directional frames 220. B- 
frames are usually inserted between l-frames or P-frames. Fig. 2 
represents a group of pictures in display order and illustrates how B- 
frames inserted between l-and P-frames, as well as showing the 
direction in which motion compensation information flows. In Figures 
1B, 1C and 2, arrows 205a depict forward motion compensation 
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prediction information necessary to reconstruct P-frames 210, whereas 
arrows 215a and 215b depict motion compensation information used in 
reconstructing B-frames 220 in forward direction (215a) and backward 
direction (215b). In other words, the arrows 205a and 215a show the 
flow of information when predictive frames are predicted from frames 
that are earlier in display order than the frame being reconstructed, and 
arrows 215b show the flow of information when predictive frames are 
predicted from frames that are later in display order than the frame 
being reconstructed. 

In motion compensated prediction, the similarity between successive 
frames in a video sequence is utilized to improve coding efficiency. 
More specifically, so-called motion vectors are used to describe the 
way in which pixels or regions of pixels move between successive 
frames of the sequence. The motion vectors provide offset values and 
error data that refer to a past or a future frame of video data having 
decoded pixel values that may be used with the error data to 
compress/encode or decompress/decode a given frame of video data. 

The capability to decode/decompress P-frames requires the availability 
of the previous I- or P-reference frame, furthermore in order to decode 
a B-frame requires the availability of the subsequent I- or P-reference 
frame is also required. For example, if an encoded/compressed data 
stream has the following frame sequence or display order: 

It B 2 B 3 P 4 B 5 P 6 B 7 P 8 B 9 B no P„ . . . P^ B^ l n , 

the corresponding decoding order is: 

It P 4 B 2 B 3 P 6 B 5 P 8 B 7 P„ B 9 B 10 . . . P^ B n _ 2 l n . 

The decoding order differs from the display order because the B- 
frames require future I- or P-frames for their decoding. Fig. 2 displays 
the beginning of the above frame sequence and can be referred to in 
order to understand the dependencies of the frames, as described 
earlier. P-frames require the previous I- or P-reference frame be 
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available. For example, P 4 requires \, to be decoded. Similarly, frame 
P 6 requires that P 4 be available in order to decode/decompress frame 
P 6 . B-frames, such as frame B 3 , require a past and/or a future I- or P- 
reference frame, such as P 4 and U in order to be decoded. B-frames 
are frames between l-or P-frames during encoding. 

A prior art system for encoding and decoding is shown in Figures 3 and 
4. Referring to the encoder 300 of Figure 3, the frame 301 being coded 
l(x,y), called the current frame, is partitioned into rectangular regions of 
KxL pixels. The coordinates (x,y) denote the location of the pixels 
within the frame. Each block is encoded using either intra coding (i.e. 
using only the spatial correlation of the image data within the block) or 
inter coding (i.e. utilising both spatial and temporal prediction). The 
following description considers the process by which inter-coded blocks 
are formed. Each inter-coded block is predicted 360 from one of the 
previously (or subsequently) coded and transmitted frames R(x,y) in 
the frame memory 350, called a reference frame. The motion 
information used for the prediction is obtained from motion estimation 
and coding block 370 by using the reference frame and the current 
frame 305. The motion information is represented by a two-dimensional 
motion vector (Ax, Ay) where Ax is a horizontal and Ay is a vertical 
displacement. In the motion compensated (MC) prediction block, the 
motion vectors are used together with the reference frame to construct 
prediction frame P(x,y): 

P(x,y) = R(x+Ax, y+Ay). 

Subsequently the prediction error E(x,y), i.e. the difference between the 
current frame and the prediction frame P(x,y), is calculated 307 
according to 

E(x,y)= l(x,y)-P(x,y). 

In transform block 310, the prediction error for each KxL block is 
represented as weighted sum of a transform basis functions fij(x,y), 
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E(x, y)=ii Cerr(iJ) f, (x, y). 

The weights c er r(i,j) corresponding to the basis functions are called 
transform coefficients. These coefficients are subsequently quantised 
in quantisation block 320 to give 

•err (U)=Q(Cerr(i,j),QP) 

where l err (ij) are the quantised transform coefficients. The quantisation 
operation Q() introduces loss of information, but the quantised 
coefficients can be represented with smaller number of bits. The level 
of compression (loss of information) is controlled by adjusting the value 
of the quantisation parameter QP. 

Before the motion vectors and quantised transform coefficients are 
supplied to the multiplexer 380, they are further encoded using 
Variable Length Codes (VLC): This reduces the number of bits needed 
to represent the motion vectors and quantised transform coefficients. 
The encoded motion vectors, quantised transform coefficients as well 
as other additional information needed to represent each coded frame 
is multiplexed in the multiplexer 380, and the resulting bit-stream is 
transmitted 415 to the decoder. The quantised transform coefficients 
are also forwarded to the inverse quantisation block 330 to obtain 
inverse quantised transform coefficients and further to the inverse 
transform block 340 to obtain the prediction error information E c (x,y) for 
the current frame. The prediction error information E c (x,y) is added to 
the predicted frame P(x,y) in a summing element to obtain a video 
frame that can be subsequently stored in the frame memory 350. 

In the following, decoding video frames will be described with reference 
to Figure 4. The decoder 400 receives the multiplexed video bit-stream 
415 from the encoder, and the demultiplexer 410 demultiplexes the bit- 
stream to obtain the constituent parts of the video frames to be 
decoded. These parts comprise at least the coded quantised prediction 
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error transform coefficients and the coded motion vectors, which are 
subsequently decoded (not shown) to obtain quantised prediction error 
transform coefficients and motion vectors. The quantised transform 
coefficients are inverse quantised in inverse quantisation block 420 to 
obtain inverse quantised transform coefficients d err (i,j) according to the 
following relation: 

derr(i,j)=Q- 1 (lerr(i,j), QP). 

In inverse transform block 430, the inverse quantised transform 
coefficients are subjected to an inverse transform to obtain prediction 
error E c (x,y): 

K L 

Ec(x,V) = £ X C err(iJ) h (X,y). 
i=lj=\ 

The pixels of the current coded frame are reconstructed by finding the 
prediction pixels in the reference frame R(x,y) obtained from frame 
memory 440, and using the received motion vectors together with the 
reference frame in motion compensation prediction block 450, to obtain 
a predicted frame P(x,y). The predicted frame P(x,y) and the prediction 
error information E c (x,y) are summed together in a summing element 
435 according to the following relation: 

l c (x,y)= R(x+Ax, y+Ay)+E c (x,y). 

These values l c (x,y) can further be filtered to obtain decoded video 
frames 445. The values l c (x,y) are also stored in the frame buffer 440. 
The reconstructed values l c (x,y) can be filtered in a filtering block (not 
shown in Figure 4) after the summation block 435. 

Video streaming has emerged as an important application in the fixed 
Internet. It is further anticipated that video streaming will also be 
important in and in the near future in 3G wireless networks. In 
streaming applications the transmitting server starts transmitting a pre- 
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encoded video bit stream via a transmission network to a receiver upon 
a request from the receiver. The receiver plays back the video stream 
back while receiving it. The best-effort nature of present networks 
causes variations in the effective bandwidth available to a user due to 
the changing network conditions. To accommodate these variations, 
the transmitting server can scale the bit rate of the compressed video. 
In the case of a conversational service characterized by real-time 
encoding and point-to-point delivery, this can be achieved by adjusting 
the source encoding parameters on the fly. Such adjustable 
parameters can be, for example, a quantisation parameter, or a frame 
rate. The adjustment is advantageously based on feedback from the 
transmission network. In typical streaming scenarios when a previously 
encoded video bit stream is to be transmitted to the receiver, the above 
solution cannot be applied. 

One solution to achieve bandwidth scalability in case of pre-encoded 
sequences is to produce multiple and independent streams having 
different bit-rates and quality. The transmitting server then dynamically 
switches between the streams to accommodate variations in the 
available bandwidth. The following example illustrates this principle. Let 
us assume that multiple bit streams are generated independently with 
different encoding parameters, such as quantisation parameter, 
corresponding to the same video sequence. Let {P^, P 1n , p 1n+1 } and 
{P2,n-i, P2,n, P2,n+i} denote the sequence of decoded frames from bit 
streams 1 and 2, respectively. Since the encoding parameters are 
different for the two bit streams, frames reconstructed from them at the 
same time instant, for example, frames P 1i(v1 and P^.,, are not 
identical. If it is now assumed that the server initially sends encoded 
frames from bit stream 1 up to time n after which it starts sending 
encoded frames from bit stream 2, the decoder receives frames {P 1trv2 , 
Pi,n-ii P2,n, P2,n+i. P^m^}. In this case P 2 , n cannot be correctly decoded 
since its reference frame P 2 , n -i is not received. On the other hand, the 
frame P 1>n . 1f which is received instead of P 2>iv1 , is not identical to P M . 
Therefore switching between bit streams at arbitrary locations leads to 
visual artefacts due to the mismatch between the reference frames 
used for motion compensated prediction in the different sequences. 
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These visual artefacts are not only confined to the frame at the 
switching point between bit streams, but propagates in time due to the 
continued motion compensated coding in the remaining part of the 
video sequence. 

In current video encoding standards, perfect (mismatch-free) switching 
between bit streams is only possible at positions where the current and 
future frames or regions thereof do not use any information previous to 
the current switching location, i.e., at l-frames. Furthermore, by placing 
l-frames at fixed (e.g. 1 sec) intervals, VCR functionalities, such as 
random access or "Fast Forward" and "Fast Backward" (increased 
playback rate) for streaming video content, are achieved. The user may 
skip a portion of the video sequence and restart playing at any l-frame 
location. Similarly, an increased playback rate can be achieved by 
transmitting only l-frames. The drawback of using l-frames in these 
applications is that since l-frames do not exploit any temporal 
redundancy they require a much larger number of bits than P-frames at 
the same quality. 

SUMMARY OF THE INVENTION 

It is an aim of the present invention to provide a new method and a 
system for transmitting video images in variable transmission 
environments. The invention is based on the idea that correct 
(mismatch-free) switching between video streams is enabled by 
forming a new type of a compressed video frame and inserting frames 
of the new type into video bit-streams at locations where switching from 
one bit-stream to another is to be allowed. In this description, the new 
type of compressed video frame will be referred to generally as an S- 
frame. More specifically, S-frames may be classified as SP-frames, 
which are formed at the decoder using motion compensated prediction 
from already decoded frames using motion vector information, and Sl- 
frames, which are formed at the decoder using spatial (intra) prediction 
from already decoded neighbouring pixels within a frame being 
decoded. In general, an S-frame according to the invention is formed 
on a block-by-block basis and may comprise both inter-coded (SP) 
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blocks as well as intra-coded (SI) blocks. The method according to the 
present invention is primarily characterized in that in switching the 
transmission over from said first bit-stream to said second bit-stream, 
the secondary bit-stream comprises at least one primary switching 
frame, a secondary switching frame is transmitted, the secondary 
switching frame having been encoded using the encoding parameters 
of the second bit-stream and at least one reference frame from the first 
bit-stream, and said secondary switching frame is used in place of the 
primary switching frame as a reference frame to be used in the 
reconstruction of said at least one predictive video frame of the second 
set of video frames. The encoder according to the present invention is 
primarily characterized in that said means for switching the 
transmission over from said first bit-stream to said second bit-stream 
comprise means for encoding a secondary switching frame using the 
encoding parameters of the second bit-stream and reference frames 
from the first bit-stream to enable to switch the transmission over from 
said first bit-stream to said second bit-stream. The decoder according 
to the present invention is primarily characterized in that the decoder 
comprises means for decoding a secondary switching frame, said 
secondary switching frame having been encoded using the encoding 
parameters of the second bit-stream and at least one reference frame 
from the first bit-stream, and added to said signal in place of a primary 
switching frame as a reference frame to be used in the reconstruction 
of said at least one predictive video frame of the second set of video 
frames, and that said means for decoding a secondary switching frame 
comprise means for using decoding parameters of the second bit- 
stream and reference frames from the first bit-stream. The signal 
according to the present invention is primarily characterized in that in 
switching the transmission over from said first bit-stream to said 
second bit-stream, the secondary bit-stream comprises at least one 
primary switching frame, said signal comprises a secondary switching 
frame having been encoded using the encoding parameters of the 
second bit-stream and at least one reference frame from the first bit- 
stream, and said secondary switching frame is in place of the primary 
switching frame as a reference frame to be used in the reconstruction 
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of said at least one predictive video frame of the second set of video 
frames. 

Considerable advantages are achieved by the present invention 
compared to the methods and systems of prior art. The invention 
allows switching between bit streams to occur not only at the locations 
of l-frames but also at the locations of the SP-frames. The coding 
efficiency of an SP-frame is much better than the coding efficiency of a 
typical l-frame wherein less bandwidth is needed to transmit bit 
streams having SP-frames in locations where, according to prior art, I- 
frames would be used, while still providing enough adaptability to 
changing transmission conditions. The switching of one bit stream into 
another can be performed at locations in which an SP-frame according 
to the invention is placed in the encoded bit stream. The images 
reconstructed from the bit stream by the decoder is not degraded as a 
result of changing from one bit stream to another. The invention also 
has the advantage that random access, fast-forward and fast rewind 
operations can be performed on the bit stream. The system according 
to the invention offers improved error recovery and resiliency properties 
compared with the described prior art solutions. 

These and other features, aspects, and advantages of embodiments of 
the present invention will become apparent with reference to the 
following description in conjunction with the accompanying drawings. It 
is to be understood, however, that the drawings are designed solely for 
the purposes of illustration and not as a definition of the limits of the 
invention, for which reference should be made to the appended claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

In the following the invention will be described in more detail with 
reference to the appended drawings, in which 

Figs. 1 A- 1C and 2 are diagrams showing the prior art 
encoding/compression of video frames, 
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is a block diagram of a generic motion-compensated 
predictive video coding system (encoder), 

is a block diagram of a generic motion-compensated 
predictive video coding system (decoder), 

is an illustration showing switching between two different bit 
streams using S-frames according to the invention, 

is a block diagram of a decoder in accordance with an 
advantageous embodiment of the invention, 

is an illustration of random access using S-frames, 

is an illustration of a fast-forward process using S-frames, 

is a block diagram of a decoder in accordance with another 
advantageous embodiment of the invention, 

is a block diagram of an decoder in accordance with yet 
another advantageous embodiment of the invention, 

is a block diagram of an encoder in accordance with an 
advantageous embodiment of the invention, 

is a block diagram of a decoder in accordance with another 
advantageous embodiment of the invention, 

is a block diagram of an encoder according to a second 
embodiment of the invention, 

is an illustration of a error resiliency/recovery process using 
SP-frames, 

is a block diagram of an encoder in accordance with the 
third advantageous embodiment of the invention, 
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Fig. 16 is a block diagram of a decoder in accordance with .yet 
another advantageous embodiment of the invention. 

DETAILED DESCRIPTION OF THE INVENTION 

In the following the invention is described in a system in which multiple 
bit streams are formed from a video signal. The video signal can be 
any digital video signal comprising multiple images, i.e. an image 
sequence. The digital video signal is encoded in an encoder to form 
multiple bit streams. Each bit stream is encoded from the same video 
signal using at least partly different encoding parameters. For example, 
the bit rate can be altered by selecting the encoding parameters 
differently, and in this way bit streams with different bit rates can be 
formed. The encoding parameters can be, for example, frame rate, 
quantisation parameter, spatial resolution, or another factor affecting 
the images size, known as such to a person skilled in the art. The 
encoder also inserts at least one Intra frame to each bit stream. 
Typically, at least the first frame of each bit stream is preferably an 
Intra frame. This enables the decoder to start reconstruction of the 
video signal. The encoder used in the encoding of the l-frames, P- 
frames and B-frames can be any prior art encoder which performs the 
encoding of the video signal, or there may exist more than one prior art 
encoder each using different encoding parameters to form multiple bit 
streams. However, to encode a video signal containing also SP-frames 
and/or Sl-frames according to the invention, new functionality is 
needed in the encoder. This will be explained later in greater detail. 

The encoder also inserts frames encoded using motion compensated 
predictive coding (P-frames and optionally B-frames) into the bit 
streams. The encoder also inserts a new type of frames, referred to as 
S-frames in this description, into each bit stream at locations where 
switching between different bit streams will be allowed according to the 
invention. The S-frames may be used at locations where in prior art 
methods an Intra coded frame would be inserted, or the S-frames may 
be used in addition to using Intra coded frames in the video sequence. 
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Advantageously, the different bit streams are stored into storage 
means for later use. However, it is also possible that transmission may 
take place substantially immediately after encoding wherein it is not 
necessary to store complete video sequences, but storing the 
necessary reference frames suffices. Transmission of the encoded 
video stream may be performed e.g. by a transmitting server, which 
has means for retrieving the stored bit streams for transmission, and/or 
means for receiving the bit streams directly from the encoder. The 
transmitting server also has means for transmitting the bit stream to the 
transmission network (not shown). 

In the following, a method according to an advantageous embodiment 
of the invention will be described. Fig. 5 depicts part of a first bit stream 
510 and part of a second bit stream 520, which are formed in the 
encoder. Only a few P-frames of the respective bit streams are shown. 
Specifically, the first bit stream 510 is shown to comprise P-frames 
511, 512, 514 and 515, while the second bit stream 520 comprises 
corresponding P-frames 521, 522, 524 and 525. Both the first bit 
stream 510 and the second bit stream 520 also comprise S-frames 513 
(also labelled S^, 523 (also labelled S 2 ) at corresponding locations. It 
is assumed that the two bit streams 510 and 520 correspond to the 
same sequence encoded at different bit rates for example by using 
different frame rates, different spatial resolutions or different 
quantisation parameters. It is further assumed that the first bit stream 
510 is being transmitted from the transmitting server to a decoder 600, 
1200, 1300 (Figs. 6, 12 and 13, respectively) via a transmission 
network, and that the transmitting server receives a request from the 
transmission network to change the bit rate of the video stream being 
transmitted. 

As mentioned above, S-frames are placed in the bit stream during the 
encoding process at those locations within the video sequences where 
switching from one bit stream to another is allowed. As can be seen 
from Figure 5, in an advantageous embodiment of the invention a 
further S-frame 550 (also labelled S 12 ) is associated with S-frames St 
and S 2 . This S-frame is called a secondary representation of the S- 
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frame (or secondary S-frame for short), and is only transmitted during 
bit stream switching. This secondary S-frame S 12 is generated by a 
special encoding of the n th frame in the video sequence using the 
encoding parameters of the corresponding S-frame 523 (S 2 ) in the 
second bit stream 520 and using motion compensated prediction from 
the reference frames of the n th frame in the first bit stream 510. It 
should be noted that in the situation shown in Figure 5, S-frame S 2 
uses the previously reconstructed frames from the second bit stream 
520 as reference frames and the secondary S-frame S 12 uses the 
previously reconstructed frames from the first bit stream 510 as 
reference frames. However, the reconstructed pixel values of both S 2 
and S 12 are identical. S-frame S 12 is transmitted only when switching 
from the first bit stream 510 to the second bit stream 520 is actually 
performed. Therefore it is not necessary to form the secondary S- 
frames during the encoding phase but only when switching is 
performed. On the other hand, it may be useful to have at least some 
secondary S-frames formed earlier at the time when the different bit 
streams are formed to reduce computational load during transmission. 

When the transmitting server reaches the frame of the video sequence 
encoded as S-frame 513 (SO in the first bit stream 510, it can begin the 
necessary operations to continue transmission of the video stream 
using the encoded frames of the second bit stream 520. At that point 
the transmitting server has already transmitted P-frames 511 and 512 
from the first bit stream 510 and the decoder 600, 1200, 1300 has 
received and decoded the respective P-frames 511, 512. Thus, those 
frames have already been stored in the frame memory 640, 1250, 1360 
of the decoder 600, 1200, 1300. The frame memory 640, 1250, 1360 
comprises sufficient memory to store all those frames, which are 
needed to reconstruct a P-frame or a B-frame, i.e. the necessary 
information of all the reference frames required by the current frame to 
be reconstructed. 

The transmitting server performs the following operations to continue 
the transmission of the video stream using the encoded frames of the 
second bit stream 520. The transmitting server notices, for example, by 
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exam.ning the type information of the frame, that the current frame to 
be transmitted is an S-frame, so it is possible to perform switching 
between the bit streams. Of course, switching is only performed if a 
request to do so has been received or there is for some other reason a 
need to perform the switch. The transmitting server inputs the 
corresponding S-frame 523 of the second bit stream, uses it to form the 
secondary S-frame 550 (S 12 ), and transmits the secondary S-frame 
(S 12 ) to the decoder 600, 1200, 1300. The transmitting server does not 
transmit the S-frame S 2 of the second bit stream, but sends the 
secondary S-frame S 12 in its place. The secondary S-frame is formed in 
such a way that by decoding the secondary S-frame S 12 the decoder 
600 can reconstruct an image identical to that which would be 
produced if it were using respective frames 521, 522 and the S-frame 
523 of the second bit stream 520. After the transmission of the 
secondary S-frame the transmitting server continues to transmit the 
encoded frames if the second bit stream 520, i.e., 524, 525, and so on. 

The S-frame 513, 523, 550 can comprise blocks encoded using only 
spat.al correlation among the pixels (intra blocks) and blocks encoded 
using both spatial and temporal correlation (inter blocks). For each inter 
block the prediction of this block, P(x,y), is formed in the decoder 600, 
1200, 1300 using received motion vectors and a reference frame The 
transform coefficients c pred for P(x,y) corresponding to basis functions 
fij(x,y) are calculated and quantised. The quantised values of the 
transform coefficients c pred are denoted as l pred and the dequantised 
values of the quantised transform coefficients l pred are denoted as d pred 
Quantised coefficients l err for the prediction error are received from the 
encoder. The dequantised values of these coefficients will be denoted 
as d err . Value of each pixel S(x,y) in the inter block is decoded as a 
weighted sum of the basis functions f y (x,y) where the weight values d rec 
will be called dequantised reconstruction image coefficients. The 
values of d rec have to be such that coefficients c rec exist from which by 
quantisation and dequantisation d rec can be obtained. In addition 
values d rec have to fulfill one of the following conditions: 



drec = d pred + derr , Or 
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Crec = C pre< j + d err . 

Values S(x,y) can further be normalized and filtered. 

Next, encoding of the S-frames placed within the bit stream, e.g. S- 
frames 513 (S,) and 523 (S 2 ) is described. 

Generally, an S-frame according to the invention such as frames 513, 
523 in Figure 5 is constructed on a block-by-block basis. As explained 
earlier, each of the blocks may be coded either in such a way as to 
take advantage of the spatial correlations among pixels of the image 
being coded (intra or Sl-blocks) or in such a way as to take advantage 
of the temporal correlation between blocks of pixels in successive 
frames of a video sequence (inter or SP-blocks). 

The encoding of S-frames according to the invention will be described 
with reference to Figure 1 1 which is a block diagram of an S-frame 
encoder 1 100 according to a first embodiment of the invention. 

A video frame to be encoded in S-frame format is first partitioned into 
blocks and each block is then encoded as either an SP-block, an Sl- 
block, or an intra-block, an intra-block being known as such from prior 
art. Switch 1190 is operated as appropriate to switch between the SI 
and SP encoding modes, i.e., the switch 1 190 is a construction used in 
the description of the invention, not necessarily a physical device. In 
SP-encoding mode switch 1190 is operated to obtain a motion 
compensated prediction for the current block 1170. Motion 
compensated prediction block 1170 forms a prediction P(x,y) for the 
current block of the frame being encoded in a manner analogous to 
that used in motion compensated prediction known from prior art. More 
specifically, motion compensated prediction block 1170 forms the 
prediction P(x,y) for the current block of the frame being encoded by 
determining a motion vector describing the relationship between the 
pixels in the current block and pixel values of a reconstructed reference 
frame held in frame memory 1146. 
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In Sl-encoding mode switch 11 90 is operated to obtain a prediction for 
the current block of the frame being coded from intra prediction block 
1180. Intra prediction block 1180 forms the prediction P(x,y) for the 
current block of the frame being encoded in a manner analogous to 
that used in intra prediction known from prior art. More specifically, 
intra prediction block 1180 forms the prediction P(x,y) for the current 
block of the frame being encoded using spatial prediction from already 
encoded neighbouring pixels within the frame being encoded. 

In both SP- and Sl-coding modes the prediction P(x,y) takes the form 
of a block of pixel values. A forward transform, for example a Discrete 
Cosine Transform (DCT), is applied to the predicted block of pixel 
values P(x,y) in block 1160 and the resulting transform coefficients, 
referred to as c pred , are subsequently quantised in quantisation block 
1150 to form quantised transform coefficients l pred . Corresponding 
operations are also performed on the original image data. More 
specifically, the current block of pixel values of the original image being 
encoded is applied to transform block 1110. Here, a forward transform 
(e.g., a DCT) is applied to the pixel values of the original image block to 
form transform coefficients c orig . These transform coefficients are 
passed to quantisation block 1120 where they are quantised to form 
quantised transform coefficients l ori9 . The summing element 1130 
receives both sets of quantised transform coefficients l pred and l orig from 
the respective quantisation blocks 1150 and 1120 and generates a set 
of quantised prediction error coefficients l err according to the 
relationship: 

lerr = lorig " lpred - 

The quantised prediction error coefficients l err are passed to multiplexer 
1135. If the current block is encoded in SP-format/mode, multiplexer 
1135 also receives the motion vectors for the SP-coded block. If the 
current block is encoded in Sl-format/mode, information concerning the 
intra prediction mode used to form the prediction for the Sl-coded block 
m intra prediction block 1180 is passed to the multiplexer. 
Advantageously, variable length coding is applied to the quantised 
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prediction error coefficients l err and to the motion vector or intra 
prediction mode information in the multiplexer 1135, a bit-stream is 
formed by multiplexing together the various forms of information and 
the bit-stream thus formed is transmitted to a corresponding decoder 
1200, 1300 (see Figs. 12 and 13). 

The S-frame encoder 1100 according to the invention also comprises 
local decoding functionality. The quantised prediction transform 
coefficients l pred formed in quantisation block 1150 are supplied to the 
summing element 1140 which also receives the quantisation error 
coefficients l err . The summing element 1140 recombines the quantised 
prediction transform coefficients l pred and the quantised prediction error 
coefficients l err to form a set of reconstructed quantised transform 
coefficients l rec according to the relationship: 

'rec = lpred + I err • 

The reconstructed quantised transform coefficients are passed to 
inverse quantisation block 1142 which inverse quantises the 
reconstructed quantised transform coefficients to form inverse 
quantised reconstructed transform coefficients d rec . The inverse 
quantised reconstructed transform coefficients are further passed to 
inverse transform block 1144 where they are subjected to e.g. an 
Inverse Discrete Cosine Transform (IDCT), or any other inverse 
transform corresponding to the transform performed in block 1160. As 
a result, a block of reconstructed pixel values is formed for the image 
block in question and is stored in frame memory 1146. As subsequent 
blocks of the frame being encoded in S-frame format undergo the 
previously described encoding and local decoding operations, a 
decoded version of the current frame is progressively assembled in the 
frame memory from where it can be accessed and used in intra 
prediction of subsequent blocks of the same frame or in inter (motion 
compensated) prediction of subsequent frames in the video sequence. 
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Operation of a generic S-frame decoder according to a first 
embodiment of the invention will now be described with reference to 
Figure 12. 

The bit-stream generated by the S-frame encoder previously described 
in connection with Figure 11 is received by decoder 1200 and is 
demultiplexed into its constituent parts by demultiplexer 1210 The 
decoder reconstructs a decoded version of the S-frame on a block-by- 
block basis. As previously described, an S-frame may comprise intra- 
blocks, SP-coded and Sl-coded image blocks. For SP-format image 
blocks, the information in the received bit-stream comprises VLC 
encoded motion coefficient information and VLC encoded quantised 
prediction error coefficients U For image blocks encoded in Sl-format 
the information in the received bit-stream comprises VLC coded 
information relating to the intra prediction mode used to form the intra 
prediction for the Sl-coded block together with VLC coded quantised 
prediction error coefficients l err . 

When decoding an SP-coded block, demultiplexer 1210 first applies 
appropriate variable length decoding (VLD) to the received bit-stream 
to recover the motion vector information and quantised prediction error 
coefficients l err . It then separates the motion vector information from the 
quantised prediction error coefficients l err . The motion vector 
information is supplied to motion compensated prediction block 1260 
and the quantised prediction error coefficients recovered from the bit- 
stream are applied to one input of summing element 1220. The motion 
vector information is used in motion compensated prediction block 
1260 together with pixel values of a previously reconstructed frame 
held in frame memory 1250 to form a prediction P(x,y) in a manner 
analogous to that employed in the encoder 1 100. 

When decoding an Sl-coded block, demultiplexer 1210 applies 
appropnate variable length decoding to the received intra prediction 
mode information and the quantised prediction error coefficients l err 
The intra prediction mode information is then separated from the 
quantised prediction error coefficients and supplied to intra prediction 
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block 1270. The quantised prediction error coefficients l err are supplied 
to one input of the summing element 1220. The intra prediction mode 
information is used in intra prediction block 1270 in conjunction with 
previously decoded pixel values of the current frame held in frame 
memory 1250 to form a prediction P(x,y) for the current block being 
decoded. Again, the intra prediction process performed in decoder 
1200 is analogous to that performed in encoder 1100 and previously 
described. 

Once a prediction for the current block of the frame being decoded has 
been formed, switch 1280 is operated so that the prediction P(x,y) 
which comprises predicted pixel values is supplied to transform block 
1290. Again, switch 1280 is an abstract construction used in the 
description of the invention, not necessarily a physical device. In the 
case of an SP-coded block, switch 1280 is operated to connect motion 
compensated prediction block 1260 to transform block 1290, while in 
the case of an Sl-coded block it is operated to connect intra prediction 
block 1 270 to transform block 1 290. 

In block 1290, a forward transform, e.g., a Discrete Cosine Transform 
(DCT), is applied to the predicted block of pixel values P(x,y) and the 
resulting transform coefficients Cp red are supplied to quantisation block 
1295 where they are quantised to form quantised transform coefficients 
Ipred- Quantised transform coefficients l pred are then supplied to the 
second input of summing element 1220 where they are added to the 
prediction error coefficients l err to form reconstructed quantised 
transform coefficients l rec according to the relationship: 

'rec = Ipred + 'err • 

The reconstructed quantised transform coefficients l rec are further 
supplied to inverse quantisation block 1230 where they are inverse 
quantised to form inverse quantised reconstructed transform 
coefficients d rec . The inverse quantised transform coefficients d rec are 
then passed to inverse transform block 1240 where they are subjected 
to e.g. an Inverse Discrete Cosine Transform (IDCT), or any other 
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inverse transform corresponding to the transform performed in block 
1290. In this way, a block of reconstructed pixel values is formed for 
the image block in question. The reconstructed pixel values are 
supplied to the video output and to frame memory 1250. As 
subsequent blocks of the S-frame being decoded undergo the 
previously described decoding operations, a decoded version of the 
current frame is progressively assembled in frame memory 1250 from 
where it can be accessed and used in the intra prediction of 
subsequent blocks of the same frame or in the inter (motion 
compensated) prediction of subsequent frames in the video sequence. 

Having reviewed the structure and function of an S-frame encoder and 
decoder according to a first embodiment of the invention, it is now 
possible to understand how S-frames according to the invention make 
it possible to switch between bit-streams without mismatch errors such 
as those encountered in prior video encoding/decoding systems. 
Referring once more to the bit-stream switching example depicted in 
Figure 5, switching from the first bit-stream 510 to the second bit- 
stream 520 occurs at the location of S-frames St (513) and S 2 (523) in 
the respective bit-streams. As previously noted, when switching is 
performed, a secondary S-frame, denoted S 12 (550) is encoded and 
transmitted. The secondary S-frame S 12 is encoded using the encoding 
parameters of the second bit-stream 520 and reference frames from 
the first bit-stream 510 in such a way that when the secondary frame 
S 12 is decoded its reconstructed pixel values are identical to those that 
would have resulted from transmission of frame S 2 in the second bit- 
stream. 

Let l 2 err and l 2 ^ denote the quantised coefficients of the prediction 
error and the prediction frame, respectively, obtained from encoding of 
the SP-frame S 2 with the procedure described above, and let l 2 rec 
denote the quantised reconstructed image coefficients of the S-frame 
S 2 .The encoding of the secondary S-frame 550 (S 12 ) follows the same 
procedures as in the encoding of the S-frame 523 (S 2 ) with the 
following exceptions: 1) The reference frame(s) used in the prediction 
of each block of the secondary S-frame S 12 are the reconstructed 
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frames obtained by decoding the first bit stream 510 up to the current 
n th frame in the video sequence, 2) The quantised prediction error 
coefficients are calculated as follows: l 12 err = l 2 rec - | 12 pred where l 12 pre d 
denotes the quantised prediction transform coefficients. The quantised 
prediction error coefficients l 12 err and the motion vectors are transmitted 
to the decoder 1200. 

When decoding the secondary S-frame Si 2 in the decoder 1200 using 
the reconstructed frames from the first bit stream 510 before the switch 
as reference frames, coefficients l 12 pred of the secondary S-frame are 
constructed and added to the received quantised prediction error 
coefficients l 12 err as described above, i.e., I 12 rec = l 12 err + | 12 pred = | 2 rec . 
I pred + I pred = l^ec- It can be noted from this equation that l 12 rec and 
l 2 rec are identical. Therefore, although the secondary S-frame S 12 and 
the S-frame of the second bit stream S 2 have different reference 
frames, when S 12 is decoded it produces an image with reconstructed 
pixel values identical to those that would result from decoding S-frame 
S 2 . 

From the foregoing description of the encoding and decoding of S 
frames according to the invention, it will be appreciated that a 
significant difference exists compared with encoding and decoding of 
P-frames and l-frames according to prior art, specifically, it should be 
appreciated that when encoding or decoding an image block in SP or 
SI -format, the prediction P(x,y) for the block in question is transformed 
into the transform coefficient domain by applying a transformation such 
as a Discrete Cosine Transform. The transform coefficients thus 
produced are subsequently quantised and a prediction error is 
determined in the quantised coefficient domain. This contrasts with 
predictive coding according to prior art in which a prediction error is 
determined in the spatial (pixel value) domain. 

In the following, operation of the decoder 1200 during switching 
between bit-streams 510 and 520 is described in detail. At the position 
in the video sequence where switching from the first bit-stream to the 
second bit-stream 520 takes place, the decoder 1200 has already 
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received and decoded the previous P-frames 511 and 512 of the first 
bit-stream 510. The decoded frames are stored in frame memory 1250 
and are thus available for use as reference frames. When switching 
from the first bit-stream 510 to the second bit-stream 520 takes place, 
encoder 1100 (figure 11) constructs and encodes secondary S-frame 
S12 (550) and transmits the encoded video information representing S 12 
to decoder 1200. 

As previously explained, encoding is performed on a block-by-block 
basis. Specifically, secondary S frame S 12 is encoded as an assembly 
of image blocks and generally each image block is encoded as either 
an SP-coded block or an Sl-coded block or an intra-block. For the SP- 
coded blocks of the secondary S-frame S 12 , the compressed video 
information transmitted from encoder to decoder takes the form of 
quantised prediction error transform coefficients l 12 err and motion vector 
information. For the Sl-coded blocks of the secondary S-frame S 12 the 
compressed video information comprises quantised prediction error 
transform coefficients l 12 err and information relating to the intra 
prediction mode used in forming the prediction for the Sl-coded block in 
the encoder. As previously explained, the compressed video 
information is subjected to appropriate variable length coding (VLC) 
prior to transmission from the encoder in order to further reduce the 
number of bits required for its representation. 

The compressed video information for a given image block is received 
at decoder 1200 and is first subjected to appropriate variable length 
decoding (VLD) and is separated into its constituent parts by 
demultiplexer 1210. The quantised prediction error coefficients l 12 err 
extracted from the received bit-stream are applied to the first input of 
summer 1220 and a block of predicted pixel values P(x,y) is formed for 
each image block according to its coding mode (SP or SI). In the case 
of an SP-coded block, the block of predicted pixel values P(x.y) is 
formed in motion compensated prediction block 1260 using a reference 
frame from the first bit-stream (e.g. P-frame 511 or 512) available in 
frame memory 1250 and the motion vector information extracted from 
the encoded video information of secondary S-frame S 12 by 
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demultiplexer 1210. In the case of an Sl-coded block, a block of 
predicted pixel values P(x,y) is formed in intra prediction block 1270 
using previously decoded pixels of the secondary S-frame S 12 also 
stored in frame memory 1250. Intra prediction is performed according 
to the intra prediction mode information extracted from the received 
video information for secondary S-frame S 12 by demultiplexer 1210. 

Once a prediction for the current block of the secondary S-frame has 
been formed the predicted pixel values P(x,y) are passed to transform 
block 1290. Here a forward transform (e.g. a Discrete Cosine 
Transform (DCT)) is applied the predicted pixel values P(x,y) to form a 
set of transform coefficients Cp red . These transform coefficients are then 
passed to quantisation block 1295 where they are quantised to form 
quantised transform coefficients r 2 pred . The quantised transform 
coefficients l 12 pred are, in turn, applied to the second input of summer 
1220. Summer 1220 combines the quantised transform coefficients 
• 12 P red with quantised prediction error transform coefficients l 12 err to form 
reconstructed quantised transform coefficients l 12 rec according to the 
relationship: 



The reconstructed quantised transform coefficients l 12 rec are then 
supplied to inverse quantisation block 1230 where they are inverse 
quantised to form inverse quantised reconstructed transform 
coefficients d 12 rec - The inverse quantised reconstructed transform 
coefficients d 12 rec are then passed to inverse transform block 1240 
where they are subjected to an inverse transform operation (e.g. an 
Inverse Discrete Cosine Transform (IDCT)). As a result, a block of 
reconstructed pixel values is formed for the current block of the 
secondary S-frame S 12 . The reconstructed pixel values l c (x,y) are 
supplied to the video output and to frame memory 1250. As 
subsequent blocks of the secondary S-frame S 12 are encoded, 
transmitted from encoder 1100 to decoder 1200 and subsequently 
decoded, a decoded version of the secondary S-frame is progressively 
built up in frame memory 1250. From there, the already decoded 
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blocks of the secondary S-frame can be retrieved and used by intra 
prediction block 1270 to form predicted pixel values P(x,y) -for 
subsequent blocks of the secondary S-frame S 12 . Here it should be 
remembered that the quantised prediction error transform coefficients 
for each image block of the secondary S-frame S 12 are generated in 
encoder 1 1 00 according to the relationship: 

|12 _ i2 i12 
• err — I rec " I pred 

where l 2 rec are quantised reconstructed transform coefficient values 
produced by encoding and subsequently decoding S-frame S 2 in the 
second bit-stream. This means that the reconstructed transform 
coefficients l 12 rec produced by decoding the compressed video 
information for secondary S-frame S 12 are identical to those which 
would have been produced if S-frame S 2 from the second bit-stream 
had been transmitted and decoded. As previously explained, this 
follows since: 

I 12 - I 12 ■ |12 
» rec — l Dred + I 



rec — i pred t ■ err 

-I 12 ^+l 2 -I 12 -I 2 
— • pred ' • rec ■ pred — • rec 



Thus, l 12 rec = I 2 rec . 



Therefore it can be seen that by constructing a secondary S-frame S 12 
according to the method of the invention, transmitting it from encoder to 
decoder and subsequently decoding it, mis-match-free switching 
between a first and a second bit-stream can be achieved. 

Consider the case that the secondary S-frame is an Sl-frame while the 
S-frame within the bit stream is an SP-frame. In this case, a frame 
which uses motion-compensated prediction is represented by a frame 
which uses only spatial prediction. This special case has implications in 
random access and error resiliency which will described later. 
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It should be noted that in encoder 1 100 and decoder 1200 according to 
the first embodiment of the invention, described above, the 
quantisation applied to the transform coefficients c pre d produced in 
transform blocks 1160 (encoder) and 1290 (decoder) to yield quantised 
5 transform coefficients l pred is the same as that used to generate the 
quantised prediction error transform coefficients \ BTT . More specifically, 
in the first embodiment of the invention, when a block of predicted pixel 
values P(x,y) is produced for an image block of an S-frame which is 
being encoded/decoded, the quantisation parameter QP used to 

10 quantise the transform coefficients Cp red corresponding to the predicted 
block of pixel values P(x,y) should be the same as the quantisation 
parameters used in generating the quantised prediction error transform 
coefficients \ en . This is advantageous since the summation performed 
in order to produce reconstructed transform coefficients l rec is 

15 performed in the quantised transform coefficient domain, i.e. since 

lrec = lpred + Urr» 

failure to use identical quantisation parameters in the construction of 
20 l pred and Urr will lead to errors in the reconstructed quantised transform 
coefficients l rec . 

Figure 15 shows a block diagram of an S-frame encoder 1500 
according to a second embodiment of the invention which provides 

25 greater flexibility in the choice of quantisation parameters to generate 
the quantised transform coefficients l pred and l err . As can be seen by 
comparing Figure 15 with Figure 11, the principal difference between 
the S-frame encoder 1500 according to a second embodiment of the 
invention and S-frame encoder 1 100 according to a first embodiment of 

30 the invention concerns the location of quantisation blocks 1525 and 
1550. The operation of the S-frame encoder 1500 according to the 
second embodiment of the invention will now be described in detail 
with reference to Figure 1 5. 

35 According to the second embodiment of the invention a video frame to 
be encoded in S-frame format is first partitioned into blocks and each 
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block is then encoded as either on SP-block or on Sl-block. Switch 
1585 is operated as appropriate to switch between the SP and- SI 
coding modes. In SP coding mode switch 1585 is operated to obtain a 
motion compensated prediction for the current block of the frame being 
encoded from motion compensated prediction block 1575. Motion 
compensated prediction block 1575 forms a block of predicted pixel 
values P(x,y) for the current block of the frame being encoded by 
determining a motion vector describing the relationship between the 
pixels of the current block and pixel values of a reconstructed reference 
frame held in frame memory 1570. 

In Sl-coding, mode switch 1585 is operated to obtain a prediction for 
the current block of the frame being encoded from intra prediction block 
1580. Intra prediction block 1580 operates in a manner analogous to 
that described in connection with the first embodiment of the invention 
to form a block of predicted pixel values P(x,y) for the current block of 
the frame being encoded using spatial prediction from already encoded 
neighbouring pixels within the frame being encoded. 

In both SP- and Sl-coding modes a forward transform, for example a 
Discrete Cosine Transform (DCT), is applied to the predicted block of 
pixel values P(x,y) in transform block 1590. The resulting transform 
coefficients c pred are supplied to summers 1520 and 1540. The original 
image data, comprising the actual pixel values of the image block 
currently being encoded, are passed to transform block 1510 where 
they too are subjected to a forward transform (e.g DCT). The resulting 
transform coefficients c ori g are then passed to summer 1520 which 
forms a difference between c orig and c pred to generate prediction error 
transform coefficients c err according to the relationship: 

C err = Cong " C pre( j 

The prediction error transform coefficients are supplied to quantisation 
block 1525 where they are quantised using quantisation parameter 
PQP to form quantised prediction error transform coefficients l err which 
are, in turn passed to multiplexer 1540. 
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If the current block Is being encoded In SP-format, multiplexer 1540 
also receives information relating to the motion vectors used in 
formation of the motion compensated prediction P(x,y) for the SP- 
coded block. If the current block is being encoded in Sl-format 
information concerning the intra prediction mode used to form the 
prediction P(x,y) for the Sl-coded block is also passed to the 
multiplexer. Advantageously, multiplexer 1540 applies appropriate 
variable length coding (VLC) to the quantised prediction error transform 
coefficients l err and to the motion vector or intra prediction mode 
information and forms a bit-stream for transmission to a corresponding 
decoder by multiplexing together the various forms of information. 

The quantised prediction error transform coefficients l err are passed 
from quantisation block 1525 to inverse quantisation block 1530 where 
they are inverse quantised using quantisation parameter PQP to form 
inverse quantised prediction error transform coefficients d err . The 
inverse quantised prediction error transform coefficients d err are next 
passed to summer 1540 where they are combined with the transform 
coefficients Cp red produced from the predicted pixel values P(x,y) for the 
current block. More specifically, summer 1540 adds transform 
coefficients c pred and inverse quantised prediction error transform 
coefficients d erT to form reconstructed transform coefficients c rec 
according to the relationship: 

Crec = Cp r ed + d err 

The reconstructed transform coefficients c rec are then passed to 
quantisation block 1550 where they are quantised using quantisation 
parameter SPQP to produce quantised reconstructed transform 
coefficients l rec . It should be noted that quantisation parameter SPQP 
used to quantise the reconstructed transform coefficients is not 
necessarily the same as the quantisation parameter PQP used to 
quantise the prediction error transform coefficients c err in quantisation 
block 1525. In particular, a finer quantisation can be applied to 
reconstructed transform coefficients c rec and a courser quantisation can 



CA 02431866 2003-06-13 



30 

be applied to prediction error coefficients c err . This ultimately results in 
a smaller reconstruction error (distortion) when a decoded image is 
formed in the decoder. 

Quantised reconstructed transform coefficients l rec are next supplied to 
inverse quantisation block 1560 where they are inverse quantised 
using quantisation parameter SPQP to form inverse quantised 
reconstructed transform coefficients d^. The inverse quantised 
reconstructed transform coefficients d rec are then passed to inverse 
transform block 1565 where they are subjected to an inverse transform 
operation e.g. an Inverse Discrete Cosine Transform (IDCT). As a 
result of this operation, a block of reconstructed pixel values l c (x,y) is 
formed for the image block in question. The block of reconstructed 
pixel values l c (x,y) is subsequently stored in frame memory 1570. As 
subsequent blocks of the frame being encoded in S-frame format 
undergo the previously described encoding and local decoding 
operations, a decoded version of the current frame is progressively 
assembled in frame memory 1570 from where it can be accessed and 
used in intra prediction of subsequent blocks of the same frame or in 
inter (motion compensated) prediction of subsequent frames in the 
video sequence. 

Operation of an S-frame decoder 1300 according to the second 
embodiment of the invention will now be described with reference to 
Figure 13. The bit-stream generated by the S-frame encoder 1500 
according to the second embodiment of the invention and described 
above in connection with Figure 15 is received by decoder 1300 and is 
demultiplexed into its constituent parts. The decoder reconstructs a 
decoded version of the S-frame on a block-by-block basis. As 
previously described, an S-frame generally comprises both SP-coded 
and Sl-coded image blocks. For SP-coded image blocks, the 
information in the received bit-stream comprises VLC encoded motion 
vector information and VLC encoded quantised prediction error 
transform coefficients l err . For image blocks encoded in Sl-format, the 
information in the received bit-stream comprises VLC encoded 
information relating to the intra prediction mode used to form the intra 
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prediction for the Sl-coded block, as well as VLC encoded quantised 
prediction error transform coefficients l err . 

When decoding an SP-coded image block, demultiplexer 1310 first 
applies appropriate variable length decoding (VLD) to the received bit- 
stream to recover the motion vector information and quantised 
prediction error coefficients \ en . It then separates the motion vector 
information from the quantised prediction error coefficients \ erT - The 
motion vector information is supplied to motion compensated prediction 
block 1370 and the quantised prediction error coefficients U recovered 
from the received bit-stream are applied to inverse quantisation block 
1320. The motion vector information recovered from the received bit- 
stream is used in motion compensated prediction block 1370 together 
with pixel values of a previously reconstructed frame held in frame 
memory 1360 to form a prediction P(x,y) for the current block being 
decoded, in a manner analogous to that employed in encoder 1500. 

When decoding an Sl-coded image block, demultiplexer 1310 applies 
appropriate variable length decoding to the received intra prediction 
mode information and the quantised prediction error transform 
coefficients l err . The intra prediction mode information is then separated 
from the quantised prediction error transform coefficients l err and is 
supplied to intra prediction block 1380. The quantised prediction error 
transform coefficients l err are supplied to inverse quantisation block 
1320. The intra prediction mode information recovered from the 
received bit-stream is used in intra prediction block 1380 in conjunction 
with previously decoded pixel values of the current frame held in frame 
memory 1360 to form a prediction P(x,y) for the current block being 
decoded. Again, the intra prediction process performed in decoder 
1200 is analogous to that performed in the corresponding encoder 
1500 previously described. 

For both SP- and SI- coded image blocks the quantised prediction error 
transform coefficients l err recovered from the received bit-stream are 
inverse quantised in inverse quantisation block 1320 using quantisation 
parameter PQP to form inverse quantised prediction error transform 
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coefficients d err . The inverse quantised prediction error transform 
Coefficients d err are applied to one input of summer 1325. 

Once a prediction P(x,y) for the current block of the frame being 
decoded has been formed, either by motion compensated prediction in 
motion compensated prediction block 1370 or by intra prediction in 
intra prediction block 1380, switch 1385 is operated as appropriate to 
supply the predicted pixel values P(x,y) to transform block 1390. Here 
a forward transform, for example a Discrete Cosine Transform (DCT) is 
applied to the predicted block of pixel values P(x,y) to form transform 
coefficients c pred . Transform coefficients c pred are then supplied to the 
second input of summer 1325 where they are combined with the 
inverse quantised prediction error transform coefficients received from 
inverse quantisation block 1320, to form reconstructed transform 
coefficients c rec . More specifically, the reconstructed transform 
coefficients are determined by adding together transform coefficients 
c pred and inverse quantised prediction error transform coefficients d err 
according to the relationship: 



The reconstructed transform coefficients c rec are then passed to 
quantisation block 1330 where they are quantised to produce 
quantised reconstructed transform coefficients l rec using quantisation 
parameter SPQP. The quantised reconstructed transform coefficients 
lrec are, in turn, supplied to inverse quantisation block 1340 where they 
are inverse quantised using quantisation parameter SPQP to form 
inverse quantised reconstructed transform coefficients d rec . The inverse 
quantised reconstructed transform coefficients d rec are then passed to 
inverse transform block 1350 where they are subjected to an inverse 
transform operation, for example an Inverse Discrete Cosine Transform 
(IDCT). As a result of the inverse transformation applied in inverse 
transform block 1350, a block of reconstructed image pixels l c (x,y) is 
formed for the image block in question. The block of reconstructed 
pixels l c (x,y) is supplied to the video output of the decoder and to frame 
memory 1360, where the pixels are stored. As subsequent blocks of 
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the S-frame undergo the decoding operations described above, a 
decoded version of the current frame is progressively assembled in 
frame memory 1360. From there in can be accessed and used in the 
intra prediction of subsequent blocks of the same frame or in inter 
(motion compensated) prediction of subsequent frames in the video 
sequence. 

In Fig. 16, an encoder according to a third embodiment of the invention 
is shown. In this embodiment, the transform coefficients c pred undergo 
quantisation and inverse quantisation using the same quantisation 
parameter SPQP both in the encoder section (blocks 1625 and 1630) 
and in the decoder section (blocks 1692 and 1694). Therefore, the 
encoder does not introduce any additional quantisation error to the 
prediction loop, and error build-up in the prediction loop is thus 
effectively prevented. The blocks 1610, 1620, 1625, 1630, 1640, 1650, 
1660, 1665, 1670, 1675, 1680, 1685, 1690 have similar functionality to 
the blocks 1510, 1520, 1525, 1530, 1540, 1550, 1560, 1565, 1570, 
1575, 1580, 1585, 1590 presented in Fig. 15, respectively. 

In Fig. 6, a decoder 600 according to an advantageous embodiment of 
the invention is described. Most of the elements of the decoder 600 are 
the same than of the decoder 1200 presented in Fig. 12. The 
operational blocks of the decoder 600 are arranged to decode the 
prediction blocks of the frames, wherein no switching means are not 
shown in Fig. 6. Other blocks 610, 615, 620, 630, 640, 650, 660 and 
670 have similar functionality to the blocks 1210, 1220, 1230, 1240, 
1250, 1260, 1290 and 1295 presented in Fig. 12, respectively. 

In Fig. 9, a decoder 600 according to another advantageous 
embodiment of the invention is described. The decoder 600 presented 
in Fig. 9 is a modification from the decoder 600 shown in Fig. 6. The 
difference between the decoder presented in Fig. 9 and the decoder 
presented in Fig. 12 is that a normalisation block 680 is inserted 
between the demultiplexer 610 and one input of summing element 615. 
Other blocks 610, 615, 620, 630, 640, 650, 660 and 670 have similar 
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functionality to the blocks 1210, 1220, 1230, 1240, 1250, 1260, 1290 
and 1295 presented in Fig. 12, respectively. 

In Fig. 10, a decoder 600 according to yet another advantageous 
embodiment of the invention is described. Most of the elements of the 
decoder 600 are the same than of the decoder 1300 presented in Fig. 
13. The operational blocks of the decoder 600 are arranged to decode 
the prediction blocks of the frames, wherein no switching means are 
not shown in Fig. 10. The other difference between the decoder 
presented in Fig. 13 and the decoder presented in Fig. 10 is that a 
normalisation block 680 is used instead of inverse quantisation block 
1230. Other blocks 610, 615, 620, 630, 640, 650, 660 and 670 have 
similar functionality to the blocks 1310, 1325, 1330, 1340, 1350, 1360, 
1370 and 1390 presented in Fig. 13, respectively. 

The encoding of a video frame can be performed block-by-block basis 
such that in the same encoded video frame there can exist differently 
encoded areas. For example, some parts of the frame can be inter 
coded and some other parts of the frame can be intra coded. The 
above procedures are applied to each part of the frame appropriately, 
according to the encoding procedure of the part in question. 

In addition to the transmission network, the request for the change of 
the bit stream transmission properties may also be originated by other 
parts of the transmission system. For example, the receiver may 
request the transmitting server to change the parameters for some 
reason. This request is delivered to the transmitting server e.g. via the 
transmission network. 

Although H.26L is used as an example of a standard, embodiments of 
the present invention and any variations and modifications are deemed 
to be within the scope of the present invention. 

Bit stream switching is not the only application in which the present 
invention can be applied. If one of the bit streams has a lower temporal 
resolution, e.g. 1 frame/sec, this bit stream can be used to provide fast- 
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forward functionality. Specifically, decoding from the bit stream with a 
lower temporal resolution and then switching to the bit stream with a 
normal frame rate would provide such functionality. Fig. 8 depicts two 
bit streams the second of which comprises only S-frames predicted 
from each other at intervals greater than the frame repetition interval of 
the first bit-stream. Furthermore, "Fast Forward" can start and stop at 
any location in the bit-stream. In the following, some other applications 
of the present invention are described. 

Splicing and Random Access 

The bit stream-switching example discussed earlier considered bit 
streams belonging to the same sequence of images. However, this is 
not necessarily the case in all situations where bit stream switching is 
needed. Examples include: switching between bit streams arriving from 
different cameras capturing the same event but from different 
perspectives, or cameras placed around a building for surveillance; 
switching to local/national programming or insertion of commercials in 
a television broadcast, video bridging, etc. The general term for the 
process of concatenating encoded bit streams is splicing. 

When switching occurs between bit streams that belong to different 
sequence of images, this only affects the encoding of the S-frames 
used to switch between bit streams, i.e., the secondary S-frame S 12 in 
Fig. 5. Specifically, the use of motion-compensated prediction of 
frames in one sequence of images using reference frames from a 
different sequence of images is not as efficient as when both bit 
streams belong to the same sequence of images. In this case, it is 
most likely that spatial prediction of the secondary S-frames frames 
would be more efficient. This is illustrated in Fig. 7 where the switching 
frame is an Sl-frame using only spatial prediction, which identically 
reconstructs the corresponding SP-frame S 2 . This method can be used 
as a random access mechanism into a bit stream and has further 
implications in error recovery and resilience as described in the 
following. 
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Error recovery 

Multiple representations of a single frame in the form of S-frames 
predicted from different reference frames, e.g., predicted from the 
immediately preceding reconstructed frame and a reconstructed frame 
further back in time can be used to increase error resilience of an 
encoded video sequence and/or to improve recovery from errors in the 
bit-stream. This is illustrated in Fig. 14. In a situation where a packet 
loss occurs during streaming of a pre-encoded bit stream and a frame 
or slice is lost, the receiver informs the transmitter of the lost 
frame/slice and the transmitter responds by sending one of the 
alternative representations of the next S-frame. The alternative 
representation, e.g. frame S 12 in Fig. 14, uses reference frames that 
have already been correctly received by the receiver. For slice based 
packetization and delivery, the sender could further estimate the slices 
that would be affected by such a slice/frame loss and update only 
those slices in the next S-frame with their alternative representations. 

Similarly, and as mentioned earlier in the discussion of splicing, the 
secondary representation of the S-frame can be generated without 
using any reference frames, i.e., an Sl 2 -frame such as that shown in 
Fig. 14. In this case, the sender would send the secondary Sl-frame, 
i.e., Sl 2 instead of S 2 to stop error propagation. This approach could 
also be extended in a straight forward manner to slice-based 
encoding/packetization. More specifically, the server sends the slices in 
the next S-frame, which would be affected by the packet loss, from Sl- 
frame. 

Error resilience 

The encoding of a video frame can be performed block-by-block basis 
such that in the same encoded video frame there can exist differently 
encoded areas. For example, some parts of the frame can be inter 
coded and some other parts of the frame can be intra coded. And as 
already discussed, the intra-block coding, since it doesn't employ any 
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temporal correlation, stops any error propagation which might be 
started due to transport impairments. 

In lossy transport networks, an intra macroblock refresh strategy can 
5 provide significant error resiliency/recovery performance. In an 
interactive client/server scenario, the encoder on the server side 
decides to encode the frames/macroblocks either based on specific 
feedback received from the client, e.g., the exact location of the 
lost/corrupted frame/slice/macro block, or based on the expected 

10 network conditions calculated through negotiation or the measured 
network conditions. This kind of intra-macroblock update strategy 
improves the quality of the received video by providing error resiliency 
and error recovery. The optimal intra-macroblock update refresh rate, 
i.e., the frequency at which macroblocks are intra-coded, depends on 

15 the transport channel conditions, e.g., packet loss and/or bit error rate. 
However, when already encoded bit streams are sent, which is the 
case in typical streaming applications, the above strategy cannot be 
applied directly. Either the sequence needs to be encoded with the 
worst-case expected network conditions or additional error 

20 resiliency/recovery mechanisms are required. 

From the earlier discussion concerning the use of S-frames in error 
recovery and splicing applications, it can be noted that S-frames or 
slices within them can easily be represented as Sl-frames/slices that 

25 do not use any reference frames while still leading to identical 
reconstruction of the S-frame. This feature can be exploited in the 
adaptive intra refresh mechanism discussed above. First, a sequence 
of images is encoded with some predefined ratio of S-macroblocks. 
Then during the transmission, some of the S-macroblocks are sent in 

30 their secondary representation that is as Sl-macro blocks. The number 
of S-macroblocks to be sent in SI representation can be calculated in a 
manner similar to the method used in the real-time encoding/delivery 
approach previously described. 
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Video Redundancy Coding 

S-frames have other uses in applications in which they do not act as 
replacements of l-frames. Video Redundancy Coding (VRC) can be 
5 given as an example. The principle of the VRC method is to divide a 
sequence of pictures into two or more threads in such a way that all 
pictures in the sequence are assigned to one of the threads in a round- 
robin fashion. Each thread is coded independently. At regular intervals, 
all threads converge into a so-called sync frame. From this sync frame, 

10 a new thread series is started. If one of the threads is damaged e.g. 
because of a packet loss, the remaining threads typically stay intact 
and can be used to predict the next sync frame. It is possible to 
continue the decoding of the damaged thread, which leads to slight 
degradation in picture quality, or to stop the decoding of the damaged 

15 thread, which leads to a drop in the frame rate. Sync frames are always 
predicted from one of the undamaged threads. This means that the 
number of transmitted l-frames can be kept small, because there is no 
need for complete re-synchronization. For the sync frame more than 
one representation (P-frame) is sent, each one using a reference frame 

20 from a different thread. Due to the use of P-frames these 
representations are not identical. Therefore mismatch is introduced 
when some of the representations cannot be decoded and their 
counterparts are used when decoding the following threads. The use of 
S-frames as sync frames eliminates this problem. 

25 

It is obvious that the present invention is not limited to the above 
described embodiments but it can be modified within the scope of the 
appended claims. 
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Claims: 

1 . A method for transmitting video information, in which at least a first 
bit-stream (510) and a second bit-stream are formed from the video 

5 information, the first bit-stream (510) comprises a first set of frames 
comprising at least one video frame, and the second bit-stream (520) 
comprises a second set (520) of frames comprising at least one 
predictive video frame (524), at least partly different encoding 
parameters are used with encoding of the frames of said first bit-stream 

10 (510) and said second bit-stream (520), at least one frame of said first 
bit-stream (51 0) is being transmitted, and the transmission is switched 
over from said first bit-stream (510) to said second bit-stream (520), 
characterized in that in switching the transmission over from said first 
bit-stream (510) to said second bit-stream (520), the secondary bit- 

15 stream comprises at least one primary switching frame (523), a. 
secondary switching frame (550) is transmitted, the secondary 
switching frame (550) having been encoded using the encoding 
parameters of the second bit-stream (520) and at least one reference 
frame from the first bit-stream (510), and said secondary switching 

20 frame (550) is used in place of the primary switching frame (523) as a 
reference frame to be used in the reconstruction of said at least one 
predictive video frame (524) of the second set of video frames. 

2. The method according to claim 1 , characterized in that the first bit- 
25 stream (51 0) comprises at least one primary switching frame (51 3). 

3. The method according to claim 1 or 2, characterized in that the 
first bit-stream (510) comprises only one intra frame and one primary 
switching frame (513) to perform a changeover from one location of the 

30 video information to another location of the video information. 

4. The method according to claim 1 or 2, characterized in that the 
first bit-stream (510) comprises only intra frames and primary switching 
frames (513) to perform fast forward operation to the video information. 

35 
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5. The method according to any of the claims 2 to 4, characterized in 
that said primary switching frame (523) is a predictive video frame 
(524), in which prediction information comprises only intra prediction 
information. 

5 

6. The method according to any of the claims 1 to 5, characterized in 
that the primary switching frame (513) is formed so that transform 
coefficients c pred are calculated and quantised to form quantised values 
of the transform coefficients l pred , that quantised coefficients l err for 

10 prediction error are defined, that dequantised reconstruction image 
coefficients d rec are defined so that coefficients c rec exist from which by 
quantisation and dequantisation, dequantised reconstruction image 
coefficients d rec can be obtained, and that dequantised reconstruction 
image coefficients d rec fulfil one of the following conditions: 

15 . 

drec = dpred + d err , Or 
Crec = Cp r ed + d err , 

in which d err are dequantised values of the prediction error, and 
20 dp re d are the dequantised values of the quantised transform 

coefficients l pre d- 

7. The method according to claim 6, characterized in that same 
quantisation parameters are used with quantisation of the transform 

25 coefficients l pre d, and with quantisation of prediction error. 

8. The method according to claim 6, characterized in that different 
quantisation parameters are used with quantisation of the transform 
coefficients l pre d, than with quantisation of prediction error. 

30 

9. The method according to any of the claims 1 to 8, characterized in 
that said secondary switching frame (550) is used for recovering from 
transmission errors, wherein said primary switching frame (523) is a 
predictive video frame (524), in which prediction information comprises 

35 prediction information from previous video frames than the preceding 
frame of the predictive video frame (524). 



CA 02431866 2003-06-13 



41 



10. The method according to any of the claims 1 to 8, characterized 
in that said secondary switching frame (550) is used for recovering 
from transmission errors, wherein said primary switching frame (523) is 

5 a predictive video frame (524), in which prediction information 
comprises only intra prediction information. 

1 1 . The method according to any of the claims 1 to 10, characterized 
in that both said at least one primary switching frame (523) and said 

10 secondary switching frame (550) produce the same reconstruction 
result of said at least one predictive video frame (524). 

12. An encoder comprising means for forming at least a first bit-stream 
(510) and a second bit-stream from a video information, the first bit- 

15 stream comprises a first set of frames comprising at lest one video, 
frame, and the second bit-stream comprises a second set (520) of 
frames comprising at least one predictive video frame (524), and 
means for using at least partly different encoding parameters with 
encoding of the frames of said first bit-stream (510) and said second 

20 bit-stream (520), means for transmitting at least one frame of said first 
bit-stream (510), and means for switching the transmission over from 
said first bit-stream (510) to said second bit-stream (520), 
characterized in that said means for switching the transmission over 
from said first bit-stream (510) to said second bit-stream (520) 

25 comprise means for encoding a secondary switching frame (550) using 
the encoding parameters of the second bit-stream (520) and reference 
frames from the first bit-stream (510) to enable to switch the 
transmission over from said first bit-stream (510) to said second bit- 
stream (510). 

30 

13. The encoder according to claim 12, characterized in that it 
comprises means (1670, 1675) for producing prediction information by 
using said reference frames, and means (1692, 1694) for performing 
quantisation and dequantisation to the prediction information. 

35 
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14. A decoder for decoding a video information from a signal 
comprising frames from at least a first bit-stream (510) and a second 
bit-stream, which are formed from the video information, the first bit- 
stream comprises a first set of frames comprising at lest one video 

5 frame, and the second bit-stream comprises a second set (520) of 
frames comprising at least one predictive video frame (524), and at 
least partly different encoding parameters are used with encoding of 
the frames of said first bit-stream (510) and said second bit-stream 
(520), characterized in that the decoder comprises means for 

10 decoding a secondary switching frame (550), said secondary switching 
frame (550) having been encoded using the encoding parameters of 
the second bit-stream (520) and at least one reference frame from the 
first bit-stream (510), and added to said signal in place of a primary 
switching frame (523) as a reference frame to be used in the 

15 reconstruction of said at least one predictive video frame (524) of the 
second set of video frames, and that said means for decoding a 
secondary switching frame (550) comprise means for using decoding 
parameters of the second bit-stream (520) and reference frames from 
the first bit-stream (510). 

20 

15. The decoder according to claim 14, characterized in that the 
primary switching frame (513) comprises areas encoded by intra 
prediction using only spatial correlation, and areas encoded by inter 
prediction using motion compensation, wherein the decoder comprises 

25 means for using motion compensation information with reconstruction, 
means for using spatial correlation information with reconstruction, and 
switching means to perform the reconstruction of each area either by 
said means for using motion compensation information or by said 
means for using spatial correlation information, depending on the 

30 prediction method used by encoding of the respective area. 

16. A signal representing encoded video information, and comprising 
frames from at least a first bit-stream (510) and a second bit-stream, 
which are formed from the video information, the first bit-stream 

35 comprises a first set of frames comprising at lest one video frame, and 
the second bit-stream comprises a second set (520) of frames 
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comprising at least one predictive video frame (524), at least partly 
different encoding parameters are used with encoding of the frames of 
said first bit-stream (510) and said second bit-stream (520), 
characterized in that in switching the transmission over from said first 
5 bit-stream (510) to said second bit-stream (520), the secondary bit- 
stream comprises at least one primary switching frame (523), said 
signal comprises a secondary switching frame (513, 523) having been 
encoded using the encoding parameters of the second bit-stream (520) 
and at least one reference frame from the first bit-stream (510), and 
10 said secondary switching frame (550) is in place of the primary 
switching frame (523) as a reference frame to be used in the 
reconstruction of said at least one predictive video frame (524) of the 
second set of video frames. 
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